test(client): add pathspec coverage for Argo and Step Functions run IDs#3264
test(client): add pathspec coverage for Argo and Step Functions run IDs#3264dheerenmohta wants to merge 2 commits into
Conversation
Fixes Netflix#948 Previously, pathspec validation only checked the number of components (e.g., "FlowName/RunID" has 2 parts). This caused confusing errors when users made typos or used invalid characters, since the error would come later from the metadata provider rather than at creation time. This change adds proper format validation: - Flow/Step/Artifact names must be valid identifiers (start with letter/underscore) - Run IDs and Task IDs must be numeric - Empty components are rejected (e.g., "Flow//Step") - Leading/trailing slashes are rejected The validation provides clear, actionable error messages that tell users exactly what's wrong with their pathspec. For example: - "Invalid flow name '123Flow'. Flow names must start with a letter..." - "Invalid run ID 'abc'. Run IDs must be numeric." Added comprehensive unit tests covering valid and invalid cases.
The existing test suite only covered numeric run IDs, which meant Bug 1 (non-numeric orchestrator run IDs raising MetaflowInvalidPathspec) had no regression coverage. Add test_run_valid_orchestrator_run_ids and test_task_valid_orchestrator_run_and_task_ids to explicitly exercise the "argo-*" and "sfn-*" run ID formats that Argo Workflows and AWS Step Functions produce respectively. Also tighten test_run_invalid_pathspecs to remove cases that assumed purely numeric run IDs (e.g. "MyFlow/abc") since those are now valid orchestrator-style IDs. Relates to Netflix#948
Greptile SummaryThis PR adds per-component pathspec validation to
Confidence Score: 3/5The two new orchestrator-ID tests will fail immediately because the run-ID regex in core.py was never updated to accept non-numeric prefixes; merging as-is adds broken tests to the suite. The added validation method in core.py still uses a digits-only pattern for run IDs, so the very tests this PR introduces to prove the Argo/SFN fix will raise MetaflowInvalidPathspec instead of passing. The companion regex change is absent from core.py, leaving the test file and the production code in direct contradiction. Both changed files need attention: Important Files Changed
Reviews (1): Last reviewed commit: "test(client): add pathspec tests for Arg..." | Re-trigger Greptile |
| # Flow names and step names should be valid Python identifiers (start with letter/underscore) | ||
| _FLOW_NAME_PATTERN = re.compile(r'^[a-zA-Z_][a-zA-Z0-9_]*$') | ||
| # Run IDs and Task IDs are numeric | ||
| _NUMERIC_ID_PATTERN = re.compile(r'^[0-9]+$') |
There was a problem hiding this comment.
Run ID pattern still numeric-only, breaking the new tests
_NUMERIC_ID_PATTERN only matches ^[0-9]+$, so _validate_pathspec_format at lines 347–351 will raise MetaflowInvalidPathspec for every Argo/SFN run ID (e.g. argo-moviestatsflow-68z2h, sfn-my-execution-name). test_run_valid_orchestrator_run_ids and test_task_valid_orchestrator_run_and_task_ids both call the same method and expect those pathspecs to return successfully — those tests will fail against this PR's code as written. The companion fix that widens the run-ID pattern (or introduces a separate _ORCHESTRATOR_RUN_ID_PATTERN) is not present in core.py here.
| # Test invalid run ID | ||
| with self.assertRaises(MetaflowInvalidPathspec) as cm: | ||
| MetaflowObject._validate_pathspec_format("MyFlow/abc", "run") | ||
| error_msg = str(cm.exception) | ||
| self.assertIn("abc", error_msg) | ||
| self.assertIn("run ID", error_msg) | ||
| self.assertIn("numeric", error_msg.lower()) |
There was a problem hiding this comment.
Error-message assertion contradicts orchestrator-ID intent
This block asserts that "MyFlow/abc" raises with "numeric" in the message. Once the validation is fixed to accept argo- / sfn- prefixed IDs (the missing companion change), the error message "Run IDs must be numeric" will no longer be accurate for the broader class of valid formats, and assertIn("numeric", ...) will create a confusing contract. Consider updating this assertion to match whatever the new error message says (e.g. "must be a valid run ID") at the same time the regex is widened.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| def test_step_invalid_pathspecs(self): | ||
| """Test that invalid step pathspecs are rejected.""" | ||
| invalid_cases = [ | ||
| ("MyFlow/123", "too few components"), | ||
| ("MyFlow/123/start/extra", "too many components"), | ||
| ("MyFlow/abc/start", "non-numeric run ID"), | ||
| ("MyFlow/123/123step", "step name starts with number"), | ||
| ("MyFlow/123/my-step", "step name contains dash"), | ||
| ("MyFlow//start", "empty run ID"), | ||
| ("MyFlow/123//", "empty step name"), | ||
| ] | ||
| for pathspec, description in invalid_cases: | ||
| with self.subTest(pathspec=pathspec, reason=description): | ||
| with self.assertRaises(MetaflowInvalidPathspec): | ||
| MetaflowObject._validate_pathspec_format(pathspec, "step") |
There was a problem hiding this comment.
Step-level orchestrator run ID not covered
test_step_invalid_pathspecs keeps ("MyFlow/abc/start", "non-numeric run ID") as a rejection case, but there is no positive test for a step pathspec that contains an orchestrator run ID (e.g. "MyFlow/argo-myflow-abc12/start"). test_task_valid_orchestrator_run_and_task_ids covers the task level, but the step and artifact levels have the same underlying validation and should also get orchestrator-ID positive cases to prevent future regressions.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #3264 +/- ##
=========================================
Coverage ? 29.04%
=========================================
Files ? 381
Lines ? 52539
Branches ? 9277
=========================================
Hits ? 15260
Misses ? 36252
Partials ? 1027 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
Problem
test_pathspec_validation.py(added in #948) only covers numeric run IDs. This means the critical regression whereRun('MyFlow/argo-...')raisesMetaflowInvalidPathspechas zero test coverage and would go undetected.Fix
Add two new test methods:
test_run_valid_orchestrator_run_idsVerifies that the following formats are accepted as valid run pathspecs:
"MovieStatsFlow/argo-moviestatsflow-68z2h"(matches the tutorial example in08-autopilot)"MyFlow/sfn-my-execution-name"(AWS Step Functions format)test_task_valid_orchestrator_run_and_task_idsSame coverage for task-level pathspecs where the run ID is an orchestrator-style string.
Also removes test cases from
test_run_invalid_pathspecsthat asserted orchestrator-style strings (e.g."MyFlow/abc") should be rejected — those cases are now intentionally valid.Companion to PR #1 (
fix/client-argo-sfn-run-id-validation).Relates to #948